-
Notifications
You must be signed in to change notification settings - Fork 340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: non-local redundancy #4491
Conversation
cacdbd6
to
6c049c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting the points that make no sense as I try to understand the upcoming redundancy improvements.
a83b3f7
to
a63e161
Compare
2331ef0
to
8e3837c
Compare
pkg/api/bzz.go
Outdated
Cache *bool `map:"Swarm-Cache"` | ||
Strategy getter.Strategy `map:"Swarm-Redundancy-Strategy"` | ||
FallbackMode bool `map:"Swarm-Redundancy-Fallback-Mode"` | ||
ChunkRetrievalTimeout time.Duration `map:"Swarm-Chunk-Retrieval-Timeout"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the purpose of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@istae Configuring this counts as an advanced dev option.
By setting chunk retrieval timeout to a suffienctly short interval one can effectively render some slow chunks not retrievable and thus 'simulate' chunk loss even in environments of perfect availability.
This is especially useful since it allows for client side trigger of redundancy-based recovery of data implemented in this PR.
ctx = getter.SetStrategy(ctx, headers.Strategy) | ||
ctx = getter.SetStrict(ctx, headers.FallbackMode) | ||
ctx = getter.SetFetchTimeout(ctx, headers.ChunkRetrievalTimeout) | ||
reader, l, err := joiner.New(ctx, s.storer.Download(cache), s.storer.Cache(), reference) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
storer.Cache() is not a reliable source for storing chunks for any operation, what is the purpose of using the cache here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what should I put here? there is no other putter available under s.storer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the purpose of using the cache here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so cache must be used to store things locally that have no postage stamp. There are three ways to get there:
- evicted from reserve (either too distant, too cheap or expired)
- landed with us through retrieval and has no postage stamp (yet)
- and now also created with the help of parities, no postage stamp available (yet).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets get to the bottom of this.
Reconstructed chunks have no postage stamp therefore they cannot be in the reserve. Putting them in a pinstore is wasteful and unclear in terms of expiry.
The only possible place to put is the localstore cache.
The status of reconstructed chunks is similar to chunks obtained through retrieval which is also cached.
The only thing we absolutely need to make sure of is that these cached chunks can and will be put in the reserve once they are offered by peers as part of pullsyncing as a chunk with a valid postage stamp.
There is one caveat about using the cache is the intricate scenario described and resolved in this hackmdQq https://hackmd.io/@zelig/Bkqol64dp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An even worse consideration is that the cache size is user-specified and can therefore be set to an arbitrarily small value, possibly ensuring that the reconstructed chunks have already been purged before they are needed again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am introducing a recoder accordingly
resolved in #4529
// Use of this source code is governed by a BSD-style | ||
// license that can be found in the LICENSE file. | ||
|
||
// the code below implements the integration of dispersed replicas in chunk fetching. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still unresolved.
wg.Wait() | ||
close(errc) | ||
for err := range errc { | ||
errs = append(errs, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still unresolved.
…#4453) Co-authored-by: Viktor Levente Tóth <[email protected]>
2521f8a
to
107f947
Compare
} | ||
defer cancelAll() | ||
run := func(s Strategy) error { | ||
if s == PROX { // NOT IMPLEMENTED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if strategy is DATA? because the strategy is incremented below, then for cases where the strategy is DATA, no other strategy is tried because PROX is not implemented ?
|
||
var stop <-chan time.Time | ||
if s < RACE { | ||
timer := time.NewTimer(strategyTimeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max time allowed to fetch all the chunks from the network is 500 ms?
continue | ||
default: | ||
} | ||
_ = g.fly(i, true) // commit (RS) or will commit to retrieve the chunk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does missing call set the chunks in flight?
// Get will call parities and other sibling chunks if the chunk address cannot be retrieved | ||
// assumes it is called for data shards only | ||
func (g *decoder) Get(ctx context.Context, addr swarm.Address) (swarm.Chunk, error) { | ||
i, ok := g.cache[addr.ByteString()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this cache even necessary?
would the GET
call receive an addr not part of this decoder??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha, so the addr that the Get receives is the address of a chunk within the scope of its parent (packed address chunk = intermediate chunk).
The decoders for every parent scope is cached in the joiner.
This cache here, is actually an index mapping addresses (children of the parent) to position.
Should be renamed probably
if !ok { | ||
return nil, storage.ErrNotFound | ||
} | ||
if g.fly(i, true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the prefetch
calls essentially sets all of these chunks to inflight, no?
why is this fly check necessary, why can't we jump to the select below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because we want singleflight behaviour on fetching, ie if prefetching fetches a chunk,, then its queried with joiner Get --> decoder Get (or the other way round), then we should just wait on the inflight fetch.
Similarly, when prefetch fetched shardCnt chunks the other chunks can be put to inflight, so that if they are `Get-ed' by the joiner they are not fetched but wait till revovered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And of course there is not always a prefetch on every data chunk (NONE and PROX will not prefetch some chunks)
// if all chunks are retrieved, signal ready | ||
n := g.fetchedCnt.Add(1) | ||
if n == int32(g.shardCnt) { | ||
close(g.ready) // signal that just enough chunks are retrieved for decoding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we allowed to close ready if the shardCnt includes parity chunks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
surely
|
||
// Get will call parities and other sibling chunks if the chunk address cannot be retrieved | ||
// assumes it is called for data shards only | ||
func (g *decoder) Get(ctx context.Context, addr swarm.Address) (swarm.Chunk, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what this Get
should ideally do is to wait for either the chunk channel g.waits[i]
to finish OR wait for a signal that the recovery has finished (and the chunk is available).
there should also be an error channel that returns when a chunk is not fetchable AND unrecoverable which would translate into a storage.ErrNotFound
.
as it stands, this GET
nevers properly returns an error, and simply waits for a context timeout in this case that recovery and/or fetch failed.
4a81fc3
to
5cc9eef
Compare
Co-authored-by: nugaon <[email protected]> Co-authored-by: Anatol <[email protected]> Co-authored-by: dysordys <[email protected]> Co-authored-by: Gyorgy Barabas <[email protected]>
will review after conflicts are resolved
0f674da
to
517a3ff
Compare
Along with the neighborhood reserve replication and node caching, Reed-Solomon erasure coding and dispersed replicas make the redundancy in Bee.
From these new redundancies, the client can recover requested data from different neighborhoods of the kademlia network.
Details in the PR descriptions
Open API Spec Version Changes (if applicable)
I think we need to change the version after #4529